Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Work around changes to urllib.parse.urlsplit #633

Merged
merged 5 commits into from
Aug 18, 2021

Conversation

judahrand
Copy link
Contributor

@judahrand judahrand commented Aug 3, 2021

Motivation

This attempts to fix the behaviour around question marks in URIs for S3 and GCS. urllib.parse.urlsplit is used and a hack was put in place to replace ? with \n to avoid issues withs params splitting. However, the behaviour of urllib.parse.urlsplit has changed and now \n, \t and \r are all stripped from the URL before splitting.

https://bugs.python.org/issue43882

My solution is to instead use the 'null character'. This could technically appear in a GCS Blob name... but should it? To me this feels fairrrrrllllly safe?

Checklist

Before you create the PR, please make sure you have:

  • [ x] Picked a concise, informative and complete title
  • [ x] Clearly explained the motivation behind the PR
  • [ x] Linked to any existing issues that your PR will be solving
  • [ x] Included tests for any new functionality
  • [ x] Checked that all unit tests pass

Workflow

Please avoid rebasing and force-pushing to the branch of the PR once a review is in progress.
Rebasing can make your commits look a bit cleaner, but it also makes life more difficult from the reviewer, because they are no longer able to distinguish between code that has already been reviewed, and unreviewed code.

@mpenkov
Copy link
Collaborator

mpenkov commented Aug 14, 2021

Thank you for this contribution. I was wondering what's going on with those question marks... this PR makes it obvious where the problem was.

However, I'm not sure if using the null character is the best way around it. It's risky, because the same thing that happened to the newline can happen to the null: the library implementation can change (I wonder why they did this, but it's out of our control) and then we'll be back to where we started.

@mpenkov
Copy link
Collaborator

mpenkov commented Aug 15, 2021

@judahrand I improved your PR slightly by replacing null with a configurable placeholder.

@piskvorky Can you please have a look at this? Does this solution look sane to you?

@mpenkov mpenkov merged commit 98e50fb into piskvorky:develop Aug 18, 2021
@mpenkov
Copy link
Collaborator

mpenkov commented Aug 18, 2021

Merged! Thank you @judahrand for taking care of this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants